A general approach for discriminative de novo motif discovery from high-throughput data
نویسندگان
چکیده
De novo motif discovery has been an important challenge of bioinformatics for the past two decades. Since the emergence of high-throughput techniques like ChIP-seq, ChIP-exo and protein-binding microarrays (PBMs), the focus of de novo motif discovery has shifted to runtime and accuracy on large data sets. For this purpose, specialized algorithms have been designed for discovering motifs in ChIP-seq or PBM data. However, none of the existing approaches work perfectly for all three high-throughput techniques. In this article, we propose Dimont, a general approach for fast and accurate de novo motif discovery from high-throughput data. We demonstrate that Dimont yields a higher number of correct motifs from ChIP-seq data than any of the specialized approaches and achieves a higher accuracy for predicting PBM intensities from probe sequence than any of the approaches specifically designed for that purpose. Dimont also reports the expected motifs for several ChIP-exo data sets. Investigating differences between in vitro and in vivo binding, we find that for most transcription factors, the motifs discovered by Dimont are in good accordance between techniques, but we also find notable exceptions. We also observe that modeling intra-motif dependencies may increase accuracy, which indicates that more complex motif models are a worthwhile field of research.
منابع مشابه
W-ChIPMotifs: a web application tool for de novo motif discovery from ChIP-based high-throughput data
UNLABELLED W-ChIPMotifs is a web application tool that provides a user friendly interface for de novo motif discovery. The web tool is based on our previous ChIPMotifs program which is a de novo motif finding tool developed for ChIP-based high-throughput data and incorporated various ab initio motif discovery tools such as MEME, MaMF, Weeder and optimized the significance of the detected motifs...
متن کاملA Feature-Based Approach to Modeling Protein–DNA Interactions
Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic...
متن کاملFactorbook Motif Pipeline: A de novo motif discovery and filter- ing web server for ChIP-seq peaks
Summary: High-throughput sequencing technologies such as ChIPseq have deepened our understanding in many biological processes. De novo motif search is one of the key downstream computational analysis following the ChIP-seq experiments and several algorithms have been proposed for this purpose. However, most webbased systems do not perform independent filtering or enrichment analyses to ensure t...
متن کاملModel-Free RNA Sequence and Structure Alignment Informed by SHAPE Probing Reveals a Conserved Alternate Secondary Structure for 16S rRNA
Discovery and characterization of functional RNA structures remains challenging due to deficiencies in de novo secondary structure modeling. Here we describe a dynamic programming approach for model-free sequence comparison that incorporates high-throughput chemical probing data. Based on SHAPE probing data alone, ribosomal RNAs (rRNAs) from three diverse organisms--the eubacteria E. coli and C...
متن کاملCompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments
UNLABELLED CompleteMOTIFs (cMOTIFs) is an integrated web tool developed to facilitate systematic discovery of overrepresented transcription factor binding motifs from high-throughput chromatin immunoprecipitation experiments. Comprehensive annotations and Boolean logic operations on multiple peak locations enable users to focus on genomic regions of interest for de novo motif discovery using to...
متن کامل